home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
SGI Freeware 2002 November
/
SGI Freeware 2002 November - Disc 1.iso
/
dist
/
fw_emacs-lisp-intro.idb
/
usr
/
freeware
/
info
/
emacs-lisp-intro.info-10.z
/
emacs-lisp-intro.info-10
Wrap
Text File
|
2002-07-08
|
48KB
|
1,116 lines
This is emacs-lisp-intro.info, produced by makeinfo version 4.0b from
emacs-lisp-intro.texi.
INFO-DIR-SECTION Emacs
START-INFO-DIR-ENTRY
* Emacs Lisp Intro: (eintr).
A simple introduction to Emacs Lisp programming.
END-INFO-DIR-ENTRY
This is an introduction to `Programming in Emacs Lisp', for people
who are not programmers.
Edition 2.04, 2001 Dec 17
Copyright (C) 1990, '91, '92, '93, '94, '95, '97, 2001 Free Software
Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1 or
any later version published by the Free Software Foundation; with the
Invariant Section being the Preface, with the Front-Cover Texts being
no Front-Cover Texts, and with the Back-Cover Texts being no Back-Cover
Texts. A copy of the license is included in the section entitled "GNU
Free Documentation License".
File: emacs-lisp-intro.info, Node: Design count-words-region, Next: Whitespace Bug, Prev: count-words-region, Up: count-words-region
Designing `count-words-region'
------------------------------
First, we will implement the word count command with a `while' loop,
then with recursion. The command will, of course, be interactive.
The template for an interactive function definition is, as always:
(defun NAME-OF-FUNCTION (ARGUMENT-LIST)
"DOCUMENTATION..."
(INTERACTIVE-EXPRESSION...)
BODY...)
What we need to do is fill in the slots.
The name of the function should be self-explanatory and similar to
the existing `count-lines-region' name. This makes the name easier to
remember. `count-words-region' is a good choice.
The function counts words within a region. This means that the
argument list must contain symbols that are bound to the two positions,
the beginning and end of the region. These two positions can be called
`beginning' and `end' respectively. The first line of the
documentation should be a single sentence, since that is all that is
printed as documentation by a command such as `apropos'. The
interactive expression will be of the form `(interactive "r")', since
that will cause Emacs to pass the beginning and end of the region to
the function's argument list. All this is routine.
The body of the function needs to be written to do three tasks:
first, to set up conditions under which the `while' loop can count
words, second, to run the `while' loop, and third, to send a message to
the user.
When a user calls `count-words-region', point may be at the
beginning or the end of the region. However, the counting process must
start at the beginning of the region. This means we will want to put
point there if it is not already there. Executing `(goto-char
beginning)' ensures this. Of course, we will want to return point to
its expected position when the function finishes its work. For this
reason, the body must be enclosed in a `save-excursion' expression.
The central part of the body of the function consists of a `while'
loop in which one expression jumps point forward word by word, and
another expression counts those jumps. The true-or-false-test of the
`while' loop should test true so long as point should jump forward, and
false when point is at the end of the region.
We could use `(forward-word 1)' as the expression for moving point
forward word by word, but it is easier to see what Emacs identifies as a
`word' if we use a regular expression search.
A regular expression search that finds the pattern for which it is
searching leaves point after the last character matched. This means
that a succession of successful word searches will move point forward
word by word.
As a practical matter, we want the regular expression search to jump
over whitespace and punctuation between words as well as over the words
themselves. A regexp that refuses to jump over interword whitespace
would never jump more than one word! This means that the regexp should
include the whitespace and punctuation that follows a word, if any, as
well as the word itself. (A word may end a buffer and not have any
following whitespace or punctuation, so that part of the regexp must be
optional.)
Thus, what we want for the regexp is a pattern defining one or more
word constituent characters followed, optionally, by one or more
characters that are not word constituents. The regular expression for
this is:
\w+\W*
The buffer's syntax table determines which characters are and are not
word constituents. (*Note What Constitutes a Word or Symbol?: Syntax,
for more about syntax. Also, see *Note Syntax: (emacs)Syntax, and
*Note Syntax Tables: (elisp)Syntax Tables.)
The search expression looks like this:
(re-search-forward "\\w+\\W*")
(Note that paired backslashes precede the `w' and `W'. A single
backslash has special meaning to the Emacs Lisp interpreter. It
indicates that the following character is interpreted differently than
usual. For example, the two characters, `\n', stand for `newline',
rather than for a backslash followed by `n'. Two backslashes in a row
stand for an ordinary, `unspecial' backslash.)
We need a counter to count how many words there are; this variable
must first be set to 0 and then incremented each time Emacs goes around
the `while' loop. The incrementing expression is simply:
(setq count (1+ count))
Finally, we want to tell the user how many words there are in the
region. The `message' function is intended for presenting this kind of
information to the user. The message has to be phrased so that it
reads properly regardless of how many words there are in the region: we
don't want to say that "there are 1 words in the region". The conflict
between singular and plural is ungrammatical. We can solve this
problem by using a conditional expression that evaluates different
messages depending on the number of words in the region. There are
three possibilities: no words in the region, one word in the region,
and more than one word. This means that the `cond' special form is
appropriate.
All this leads to the following function definition:
;;; First version; has bugs!
(defun count-words-region (beginning end)
"Print number of words in the region.
Words are defined as at least one word-constituent
character followed by at least one character that
is not a word-constituent. The buffer's syntax
table determines which characters these are."
(interactive "r")
(message "Counting words in region ... ")
;;; 1. Set up appropriate conditions.
(save-excursion
(goto-char beginning)
(let ((count 0))
;;; 2. Run the while loop.
(while (< (point) end)
(re-search-forward "\\w+\\W*")
(setq count (1+ count)))
;;; 3. Send a message to the user.
(cond ((zerop count)
(message
"The region does NOT have any words."))
((= 1 count)
(message
"The region has 1 word."))
(t
(message
"The region has %d words." count))))))
As written, the function works, but not in all circumstances.
File: emacs-lisp-intro.info, Node: Whitespace Bug, Prev: Design count-words-region, Up: count-words-region
The Whitespace Bug in `count-words-region'
------------------------------------------
The `count-words-region' command described in the preceding section
has two bugs, or rather, one bug with two manifestations. First, if
you mark a region containing only whitespace in the middle of some
text, the `count-words-region' command tells you that the region
contains one word! Second, if you mark a region containing only
whitespace at the end of the buffer or the accessible portion of a
narrowed buffer, the command displays an error message that looks like
this:
Search failed: "\\w+\\W*"
If you are reading this in Info in GNU Emacs, you can test for these
bugs yourself.
First, evaluate the function in the usual manner to install it.
Here is a copy of the definition. Place your cursor after the closing
parenthesis and type `C-x C-e' to install it.
;; First version; has bugs!
(defun count-words-region (beginning end)
"Print number of words in the region.
Words are defined as at least one word-constituent character followed
by at least one character that is not a word-constituent. The buffer's
syntax table determines which characters these are."
(interactive "r")
(message "Counting words in region ... ")
;;; 1. Set up appropriate conditions.
(save-excursion
(goto-char beginning)
(let ((count 0))
;;; 2. Run the while loop.
(while (< (point) end)
(re-search-forward "\\w+\\W*")
(setq count (1+ count)))
;;; 3. Send a message to the user.
(cond ((zerop count)
(message "The region does NOT have any words."))
((= 1 count) (message "The region has 1 word."))
(t (message "The region has %d words." count))))))
If you wish, you can also install this keybinding by evaluating it:
(global-set-key "\C-c=" 'count-words-region)
To conduct the first test, set mark and point to the beginning and
end of the following line and then type `C-c =' (or `M-x
count-words-region' if you have not bound `C-c ='):
one two three
Emacs will tell you, correctly, that the region has three words.
Repeat the test, but place mark at the beginning of the line and
place point just _before_ the word `one'. Again type the command `C-c
=' (or `M-x count-words-region'). Emacs should tell you that the
region has no words, since it is composed only of the whitespace at the
beginning of the line. But instead Emacs tells you that the region has
one word!
For the third test, copy the sample line to the end of the
`*scratch*' buffer and then type several spaces at the end of the line.
Place mark right after the word `three' and point at the end of line.
(The end of the line will be the end of the buffer.) Type `C-c =' (or
`M-x count-words-region') as you did before. Again, Emacs should tell
you that the region has no words, since it is composed only of the
whitespace at the end of the line. Instead, Emacs displays an error
message saying `Search failed'.
The two bugs stem from the same problem.
Consider the first manifestation of the bug, in which the command
tells you that the whitespace at the beginning of the line contains one
word. What happens is this: The `M-x count-words-region' command moves
point to the beginning of the region. The `while' tests whether the
value of point is smaller than the value of `end', which it is.
Consequently, the regular expression search looks for and finds the
first word. It leaves point after the word. `count' is set to one.
The `while' loop repeats; but this time the value of point is larger
than the value of `end', the loop is exited; and the function displays
a message saying the number of words in the region is one. In brief,
the regular expression search looks for and finds the word even though
it is outside the marked region.
In the second manifestation of the bug, the region is whitespace at
the end of the buffer. Emacs says `Search failed'. What happens is
that the true-or-false-test in the `while' loop tests true, so the
search expression is executed. But since there are no more words in
the buffer, the search fails.
In both manifestations of the bug, the search extends or attempts to
extend outside of the region.
The solution is to limit the search to the region--this is a fairly
simple action, but as you may have come to expect, it is not quite as
simple as you might think.
As we have seen, the `re-search-forward' function takes a search
pattern as its first argument. But in addition to this first,
mandatory argument, it accepts three optional arguments. The optional
second argument bounds the search. The optional third argument, if
`t', causes the function to return `nil' rather than signal an error if
the search fails. The optional fourth argument is a repeat count. (In
Emacs, you can see a function's documentation by typing `C-h f', the
name of the function, and then <RET>.)
In the `count-words-region' definition, the value of the end of the
region is held by the variable `end' which is passed as an argument to
the function. Thus, we can add `end' as an argument to the regular
expression search expression:
(re-search-forward "\\w+\\W*" end)
However, if you make only this change to the `count-words-region'
definition and then test the new version of the definition on a stretch
of whitespace, you will receive an error message saying `Search failed'.
What happens is this: the search is limited to the region, and fails
as you expect because there are no word-constituent characters in the
region. Since it fails, we receive an error message. But we do not
want to receive an error message in this case; we want to receive the
message that "The region does NOT have any words."
The solution to this problem is to provide `re-search-forward' with
a third argument of `t', which causes the function to return `nil'
rather than signal an error if the search fails.
However, if you make this change and try it, you will see the message
"Counting words in region ... " and ... you will keep on seeing that
message ..., until you type `C-g' (`keyboard-quit').
Here is what happens: the search is limited to the region, as before,
and it fails because there are no word-constituent characters in the
region, as expected. Consequently, the `re-search-forward' expression
returns `nil'. It does nothing else. In particular, it does not move
point, which it does as a side effect if it finds the search target.
After the `re-search-forward' expression returns `nil', the next
expression in the `while' loop is evaluated. This expression
increments the count. Then the loop repeats. The true-or-false-test
tests true because the value of point is still less than the value of
end, since the `re-search-forward' expression did not move point. ...
and the cycle repeats ...
The `count-words-region' definition requires yet another
modification, to cause the true-or-false-test of the `while' loop to
test false if the search fails. Put another way, there are two
conditions that must be satisfied in the true-or-false-test before the
word count variable is incremented: point must still be within the
region and the search expression must have found a word to count.
Since both the first condition and the second condition must be true
together, the two expressions, the region test and the search
expression, can be joined with an `and' special form and embedded in
the `while' loop as the true-or-false-test, like this:
(and (< (point) end) (re-search-forward "\\w+\\W*" end t))
(*Note forward-paragraph::, for information about `and'.)
The `re-search-forward' expression returns `t' if the search
succeeds and as a side effect moves point. Consequently, as words are
found, point is moved through the region. When the search expression
fails to find another word, or when point reaches the end of the
region, the true-or-false-test tests false, the `while' loop exists,
and the `count-words-region' function displays one or other of its
messages.
After incorporating these final changes, the `count-words-region'
works without bugs (or at least, without bugs that I have found!).
Here is what it looks like:
;;; Final version: `while'
(defun count-words-region (beginning end)
"Print number of words in the region."
(interactive "r")
(message "Counting words in region ... ")
;;; 1. Set up appropriate conditions.
(save-excursion
(let ((count 0))
(goto-char beginning)
;;; 2. Run the while loop.
(while (and (< (point) end)
(re-search-forward "\\w+\\W*" end t))
(setq count (1+ count)))
;;; 3. Send a message to the user.
(cond ((zerop count)
(message
"The region does NOT have any words."))
((= 1 count)
(message
"The region has 1 word."))
(t
(message
"The region has %d words." count))))))
File: emacs-lisp-intro.info, Node: recursive-count-words, Next: Counting Exercise, Prev: count-words-region, Up: Counting Words
Count Words Recursively
=======================
You can write the function for counting words recursively as well as
with a `while' loop. Let's see how this is done.
First, we need to recognize that the `count-words-region' function
has three jobs: it sets up the appropriate conditions for counting to
occur; it counts the words in the region; and it sends a message to the
user telling how many words there are.
If we write a single recursive function to do everything, we will
receive a message for every recursive call. If the region contains 13
words, we will receive thirteen messages, one right after the other.
We don't want this! Instead, we must write two functions to do the
job, one of which (the recursive function) will be used inside of the
other. One function will set up the conditions and display the
message; the other will return the word count.
Let us start with the function that causes the message to be
displayed. We can continue to call this `count-words-region'.
This is the function that the user will call. It will be
interactive. Indeed, it will be similar to our previous versions of
this function, except that it will call `recursive-count-words' to
determine how many words are in the region.
We can readily construct a template for this function, based on our
previous versions:
;; Recursive version; uses regular expression search
(defun count-words-region (beginning end)
"DOCUMENTATION..."
(INTERACTIVE-EXPRESSION...)
;;; 1. Set up appropriate conditions.
(EXPLANATORY MESSAGE)
(SET-UP FUNCTIONS...
;;; 2. Count the words.
RECURSIVE CALL
;;; 3. Send a message to the user.
MESSAGE PROVIDING WORD COUNT))
The definition looks straightforward, except that somehow the count
returned by the recursive call must be passed to the message displaying
the word count. A little thought suggests that this can be done by
making use of a `let' expression: we can bind a variable in the varlist
of a `let' expression to the number of words in the region, as returned
by the recursive call; and then the `cond' expression, using binding,
can display the value to the user.
Often, one thinks of the binding within a `let' expression as
somehow secondary to the `primary' work of a function. But in this
case, what you might consider the `primary' job of the function,
counting words, is done within the `let' expression.
Using `let', the function definition looks like this:
(defun count-words-region (beginning end)
"Print number of words in the region."
(interactive "r")
;;; 1. Set up appropriate conditions.
(message "Counting words in region ... ")
(save-excursion
(goto-char beginning)
;;; 2. Count the words.
(let ((count (recursive-count-words end)))
;;; 3. Send a message to the user.
(cond ((zerop count)
(message
"The region does NOT have any words."))
((= 1 count)
(message
"The region has 1 word."))
(t
(message
"The region has %d words." count))))))
Next, we need to write the recursive counting function.
A recursive function has at least three parts: the `do-again-test',
the `next-step-expression', and the recursive call.
The do-again-test determines whether the function will or will not be
called again. Since we are counting words in a region and can use a
function that moves point forward for every word, the do-again-test can
check whether point is still within the region. The do-again-test
should find the value of point and determine whether point is before,
at, or after the value of the end of the region. We can use the
`point' function to locate point. Clearly, we must pass the value of
the end of the region to the recursive counting function as an argument.
In addition, the do-again-test should also test whether the search
finds a word. If it does not, the function should not call itself
again.
The next-step-expression changes a value so that when the recursive
function is supposed to stop calling itself, it stops. More precisely,
the next-step-expression changes a value so that at the right time, the
do-again-test stops the recursive function from calling itself again.
In this case, the next-step-expression can be the expression that moves
point forward, word by word.
The third part of a recursive function is the recursive call.
Somewhere, also, we also need a part that does the `work' of the
function, a part that does the counting. A vital part!
But already, we have an outline of the recursive counting function:
(defun recursive-count-words (region-end)
"DOCUMENTATION..."
DO-AGAIN-TEST
NEXT-STEP-EXPRESSION
RECURSIVE CALL)
Now we need to fill in the slots. Let's start with the simplest
cases first: if point is at or beyond the end of the region, there
cannot be any words in the region, so the function should return zero.
Likewise, if the search fails, there are no words to count, so the
function should return zero.
On the other hand, if point is within the region and the search
succeeds, the function should call itself again.
Thus, the do-again-test should look like this:
(and (< (point) region-end)
(re-search-forward "\\w+\\W*" region-end t))
Note that the search expression is part of the do-again-test--the
function returns `t' if its search succeeds and `nil' if it fails.
(*Note The Whitespace Bug in `count-words-region': Whitespace Bug, for
an explanation of how `re-search-forward' works.)
The do-again-test is the true-or-false test of an `if' clause.
Clearly, if the do-again-test succeeds, the then-part of the `if'
clause should call the function again; but if it fails, the else-part
should return zero since either point is outside the region or the
search failed because there were no words to find.
But before considering the recursive call, we need to consider the
next-step-expression. What is it? Interestingly, it is the search
part of the do-again-test.
In addition to returning `t' or `nil' for the do-again-test,
`re-search-forward' moves point forward as a side effect of a
successful search. This is the action that changes the value of point
so that the recursive function stops calling itself when point
completes its movement through the region. Consequently, the
`re-search-forward' expression is the next-step-expression.
In outline, then, the body of the `recursive-count-words' function
looks like this:
(if DO-AGAIN-TEST-AND-NEXT-STEP-COMBINED
;; then
RECURSIVE-CALL-RETURNING-COUNT
;; else
RETURN-ZERO)
How to incorporate the mechanism that counts?
If you are not used to writing recursive functions, a question like
this can be troublesome. But it can and should be approached
systematically.
We know that the counting mechanism should be associated in some way
with the recursive call. Indeed, since the next-step-expression moves
point forward by one word, and since a recursive call is made for each
word, the counting mechanism must be an expression that adds one to the
value returned by a call to `recursive-count-words'.
Consider several cases:
* If there are two words in the region, the function should return a
value resulting from adding one to the value returned when it
counts the first word, plus the number returned when it counts the
remaining words in the region, which in this case is one.
* If there is one word in the region, the function should return a
value resulting from adding one to the value returned when it
counts that word, plus the number returned when it counts the
remaining words in the region, which in this case is zero.
* If there are no words in the region, the function should return
zero.
From the sketch we can see that the else-part of the `if' returns
zero for the case of no words. This means that the then-part of the
`if' must return a value resulting from adding one to the value
returned from a count of the remaining words.
The expression will look like this, where `1+' is a function that
adds one to its argument.
(1+ (recursive-count-words region-end))
The whole `recursive-count-words' function will then look like this:
(defun recursive-count-words (region-end)
"DOCUMENTATION..."
;;; 1. do-again-test
(if (and (< (point) region-end)
(re-search-forward "\\w+\\W*" region-end t))
;;; 2. then-part: the recursive call
(1+ (recursive-count-words region-end))
;;; 3. else-part
0))
Let's examine how this works:
If there are no words in the region, the else part of the `if'
expression is evaluated and consequently the function returns zero.
If there is one word in the region, the value of point is less than
the value of `region-end' and the search succeeds. In this case, the
true-or-false-test of the `if' expression tests true, and the then-part
of the `if' expression is evaluated. The counting expression is
evaluated. This expression returns a value (which will be the value
returned by the whole function) that is the sum of one added to the
value returned by a recursive call.
Meanwhile, the next-step-expression has caused point to jump over the
first (and in this case only) word in the region. This means that when
`(recursive-count-words region-end)' is evaluated a second time, as a
result of the recursive call, the value of point will be equal to or
greater than the value of region end. So this time,
`recursive-count-words' will return zero. The zero will be added to
one, and the original evaluation of `recursive-count-words' will return
one plus zero, which is one, which is the correct amount.
Clearly, if there are two words in the region, the first call to
`recursive-count-words' returns one added to the value returned by
calling `recursive-count-words' on a region containing the remaining
word--that is, it adds one to one, producing two, which is the correct
amount.
Similarly, if there are three words in the region, the first call to
`recursive-count-words' returns one added to the value returned by
calling `recursive-count-words' on a region containing the remaining
two words--and so on and so on.
With full documentation the two functions look like this:
The recursive function:
(defun recursive-count-words (region-end)
"Number of words between point and REGION-END."
;;; 1. do-again-test
(if (and (< (point) region-end)
(re-search-forward "\\w+\\W*" region-end t))
;;; 2. then-part: the recursive call
(1+ (recursive-count-words region-end))
;;; 3. else-part
0))
The wrapper:
;;; Recursive version
(defun count-words-region (beginning end)
"Print number of words in the region.
Words are defined as at least one word-constituent
character followed by at least one character that is
not a word-constituent. The buffer's syntax table
determines which characters these are."
(interactive "r")
(message "Counting words in region ... ")
(save-excursion
(goto-char beginning)
(let ((count (recursive-count-words end)))
(cond ((zerop count)
(message
"The region does NOT have any words."))
((= 1 count)
(message "The region has 1 word."))
(t
(message
"The region has %d words." count))))))
File: emacs-lisp-intro.info, Node: Counting Exercise, Prev: recursive-count-words, Up: Counting Words
Exercise: Counting Punctuation
==============================
Using a `while' loop, write a function to count the number of
punctuation marks in a region--period, comma, semicolon, colon,
exclamation mark, and question mark. Do the same using recursion.
File: emacs-lisp-intro.info, Node: Words in a defun, Next: Readying a Graph, Prev: Counting Words, Up: Top
Counting Words in a `defun'
***************************
Our next project is to count the number of words in a function
definition. Clearly, this can be done using some variant of
`count-word-region'. *Note Counting Words: Repetition and Regexps:
Counting Words. If we are just going to count the words in one
definition, it is easy enough to mark the definition with the `C-M-h'
(`mark-defun') command, and then call `count-word-region'.
However, I am more ambitious: I want to count the words and symbols
in every definition in the Emacs sources and then print a graph that
shows how many functions there are of each length: how many contain 40
to 49 words or symbols, how many contain 50 to 59 words or symbols, and
so on. I have often been curious how long a typical function is, and
this will tell.
* Menu:
* Divide and Conquer::
* Words and Symbols:: What to count?
* Syntax:: What constitutes a word or symbol?
* count-words-in-defun:: Very like `count-words'.
* Several defuns:: Counting several defuns in a file.
* Find a File:: Do you want to look at a file?
* lengths-list-file:: A list of the lengths of many definitions.
* Several files:: Counting in definitions in different files.
* Several files recursively:: Recursively counting in different files.
* Prepare the data:: Prepare the data for display in a graph.
File: emacs-lisp-intro.info, Node: Divide and Conquer, Next: Words and Symbols, Prev: Words in a defun, Up: Words in a defun
Divide and Conquer
==================
Described in one phrase, the histogram project is daunting; but
divided into numerous small steps, each of which we can take one at a
time, the project becomes less fearsome. Let us consider what the
steps must be:
* First, write a function to count the words in one definition. This
includes the problem of handling symbols as well as words.
* Second, write a function to list the numbers of words in each
function in a file. This function can use the
`count-words-in-defun' function.
* Third, write a function to list the numbers of words in each
function in each of several files. This entails automatically
finding the various files, switching to them, and counting the
words in the definitions within them.
* Fourth, write a function to convert the list of numbers that we
created in step three to a form that will be suitable for printing
as a graph.
* Fifth, write a function to print the results as a graph.
This is quite a project! But if we take each step slowly, it will
not be difficult.
File: emacs-lisp-intro.info, Node: Words and Symbols, Next: Syntax, Prev: Divide and Conquer, Up: Words in a defun
What to Count?
==============
When we first start thinking about how to count the words in a
function definition, the first question is (or ought to be) what are we
going to count? When we speak of `words' with respect to a Lisp
function definition, we are actually speaking, in large part, of
`symbols'. For example, the following `multiply-by-seven' function
contains the five symbols `defun', `multiply-by-seven', `number', `*',
and `7'. In addition, in the documentation string, it contains the
four words `Multiply', `NUMBER', `by', and `seven'. The symbol
`number' is repeated, so the definition contains a total of ten words
and symbols.
(defun multiply-by-seven (number)
"Multiply NUMBER by seven."
(* 7 number))
However, if we mark the `multiply-by-seven' definition with `C-M-h'
(`mark-defun'), and then call `count-words-region' on it, we will find
that `count-words-region' claims the definition has eleven words, not
ten! Something is wrong!
The problem is twofold: `count-words-region' does not count the `*'
as a word, and it counts the single symbol, `multiply-by-seven', as
containing three words. The hyphens are treated as if they were
interword spaces rather than intraword connectors: `multiply-by-seven'
is counted as if it were written `multiply by seven'.
The cause of this confusion is the regular expression search within
the `count-words-region' definition that moves point forward word by
word. In the canonical version of `count-words-region', the regexp is:
"\\w+\\W*"
This regular expression is a pattern defining one or more word
constituent characters possibly followed by one or more characters that
are not word constituents. What is meant by `word constituent
characters' brings us to the issue of syntax, which is worth a section
of its own.
File: emacs-lisp-intro.info, Node: Syntax, Next: count-words-in-defun, Prev: Words and Symbols, Up: Words in a defun
What Constitutes a Word or Symbol?
==================================
Emacs treats different characters as belonging to different "syntax
categories". For example, the regular expression, `\\w+', is a pattern
specifying one or more _word constituent_ characters. Word constituent
characters are members of one syntax category. Other syntax categories
include the class of punctuation characters, such as the period and the
comma, and the class of whitespace characters, such as the blank space
and the tab character. (For more information, see *Note Syntax:
(emacs)Syntax, and *Note Syntax Tables: (elisp)Syntax Tables.)
Syntax tables specify which characters belong to which categories.
Usually, a hyphen is not specified as a `word constituent character'.
Instead, it is specified as being in the `class of characters that are
part of symbol names but not words.' This means that the
`count-words-region' function treats it in the same way it treats an
interword white space, which is why `count-words-region' counts
`multiply-by-seven' as three words.
There are two ways to cause Emacs to count `multiply-by-seven' as
one symbol: modify the syntax table or modify the regular expression.
We could redefine a hyphen as a word constituent character by
modifying the syntax table that Emacs keeps for each mode. This action
would serve our purpose, except that a hyphen is merely the most common
character within symbols that is not typically a word constituent
character; there are others, too.
Alternatively, we can redefine the regular expression used in the
`count-words' definition so as to include symbols. This procedure has
the merit of clarity, but the task is a little tricky.
The first part is simple enough: the pattern must match "at least one
character that is a word or symbol constituent". Thus:
"\\(\\w\\|\\s_\\)+"
The `\\(' is the first part of the grouping construct that includes the
`\\w' and the `\\s_' as alternatives, separated by the `\\|'. The
`\\w' matches any word-constituent character and the `\\s_' matches any
character that is part of a symbol name but not a word-constituent
character. The `+' following the group indicates that the word or
symbol constituent characters must be matched at least once.
However, the second part of the regexp is more difficult to design.
What we want is to follow the first part with "optionally one or more
characters that are not constituents of a word or symbol". At first, I
thought I could define this with the following:
"\\(\\W\\|\\S_\\)*"
The upper case `W' and `S' match characters that are _not_ word or
symbol constituents. Unfortunately, this expression matches any
character that is either not a word constituent or not a symbol
constituent. This matches any character!
I then noticed that every word or symbol in my test region was
followed by white space (blank space, tab, or newline). So I tried
placing a pattern to match one or more blank spaces after the pattern
for one or more word or symbol constituents. This failed, too. Words
and symbols are often separated by whitespace, but in actual code
parentheses may follow symbols and punctuation may follow words. So
finally, I designed a pattern in which the word or symbol constituents
are followed optionally by characters that are not white space and then
followed optionally by white space.
Here is the full regular expression:
"\\(\\w\\|\\s_\\)+[^ \t\n]*[ \t\n]*"
File: emacs-lisp-intro.info, Node: count-words-in-defun, Next: Several defuns, Prev: Syntax, Up: Words in a defun
The `count-words-in-defun' Function
===================================
We have seen that there are several ways to write a
`count-word-region' function. To write a `count-words-in-defun', we
need merely adapt one of these versions.
The version that uses a `while' loop is easy to understand, so I am
going to adapt that. Because `count-words-in-defun' will be part of a
more complex program, it need not be interactive and it need not
display a message but just return the count. These considerations
simplify the definition a little.
On the other hand, `count-words-in-defun' will be used within a
buffer that contains function definitions. Consequently, it is
reasonable to ask that the function determine whether it is called when
point is within a function definition, and if it is, to return the
count for that definition. This adds complexity to the definition, but
saves us from needing to pass arguments to the function.
These considerations lead us to prepare the following template:
(defun count-words-in-defun ()
"DOCUMENTATION..."
(SET UP...
(WHILE LOOP...)
RETURN COUNT)
As usual, our job is to fill in the slots.
First, the set up.
We are presuming that this function will be called within a buffer
containing function definitions. Point will either be within a
function definition or not. For `count-words-in-defun' to work, point
must move to the beginning of the definition, a counter must start at
zero, and the counting loop must stop when point reaches the end of the
definition.
The `beginning-of-defun' function searches backwards for an opening
delimiter such as a `(' at the beginning of a line, and moves point to
that position, or else to the limit of the search. In practice, this
means that `beginning-of-defun' moves point to the beginning of an
enclosing or preceding function definition, or else to the beginning of
the buffer. We can use `beginning-of-defun' to place point where we
wish to start.
The `while' loop requires a counter to keep track of the words or
symbols being counted. A `let' expression can be used to create a
local variable for this purpose, and bind it to an initial value of
zero.
The `end-of-defun' function works like `beginning-of-defun' except
that it moves point to the end of the definition. `end-of-defun' can
be used as part of an expression that determines the position of the
end of the definition.
The set up for `count-words-in-defun' takes shape rapidly: first we
move point to the beginning of the definition, then we create a local
variable to hold the count, and finally, we record the position of the
end of the definition so the `while' loop will know when to stop
looping.
The code looks like this:
(beginning-of-defun)
(let ((count 0)
(end (save-excursion (end-of-defun) (point))))
The code is simple. The only slight complication is likely to concern
`end': it is bound to the position of the end of the definition by a
`save-excursion' expression that returns the value of point after
`end-of-defun' temporarily moves it to the end of the definition.
The second part of the `count-words-in-defun', after the set up, is
the `while' loop.
The loop must contain an expression that jumps point forward word by
word and symbol by symbol, and another expression that counts the
jumps. The true-or-false-test for the `while' loop should test true so
long as point should jump forward, and false when point is at the end
of the definition. We have already redefined the regular expression
for this (*note Syntax::), so the loop is straightforward:
(while (and (< (point) end)
(re-search-forward
"\\(\\w\\|\\s_\\)+[^ \t\n]*[ \t\n]*" end t)
(setq count (1+ count)))
The third part of the function definition returns the count of words
and symbols. This part is the last expression within the body of the
`let' expression, and can be, very simply, the local variable `count',
which when evaluated returns the count.
Put together, the `count-words-in-defun' definition looks like this:
(defun count-words-in-defun ()
"Return the number of words and symbols in a defun."
(beginning-of-defun)
(let ((count 0)
(end (save-excursion (end-of-defun) (point))))
(while
(and (< (point) end)
(re-search-forward
"\\(\\w\\|\\s_\\)+[^ \t\n]*[ \t\n]*"
end t))
(setq count (1+ count)))
count))
How to test this? The function is not interactive, but it is easy to
put a wrapper around the function to make it interactive; we can use
almost the same code as for the recursive version of
`count-words-region':
;;; Interactive version.
(defun count-words-defun ()
"Number of words and symbols in a function definition."
(interactive)
(message
"Counting words and symbols in function definition ... ")
(let ((count (count-words-in-defun)))
(cond
((zerop count)
(message
"The definition does NOT have any words or symbols."))
((= 1 count)
(message
"The definition has 1 word or symbol."))
(t
(message
"The definition has %d words or symbols." count)))))
Let's re-use `C-c =' as a convenient keybinding:
(global-set-key "\C-c=" 'count-words-defun)
Now we can try out `count-words-defun': install both
`count-words-in-defun' and `count-words-defun', and set the keybinding,
and then place the cursor within the following definition:
(defun multiply-by-seven (number)
"Multiply NUMBER by seven."
(* 7 number))
=> 10
Success! The definition has 10 words and symbols.
The next problem is to count the numbers of words and symbols in
several definitions within a single file.
File: emacs-lisp-intro.info, Node: Several defuns, Next: Find a File, Prev: count-words-in-defun, Up: Words in a defun
Count Several `defuns' Within a File
====================================
A file such as `simple.el' may have 80 or more function definitions
within it. Our long term goal is to collect statistics on many files,
but as a first step, our immediate goal is to collect statistics on one
file.
The information will be a series of numbers, each number being the
length of a function definition. We can store the numbers in a list.
We know that we will want to incorporate the information regarding
one file with information about many other files; this means that the
function for counting definition lengths within one file need only
return the list of lengths. It need not and should not display any
messages.
The word count commands contain one expression to jump point forward
word by word and another expression to count the jumps. The function
to return the lengths of definitions can be designed to work the same
way, with one expression to jump point forward definition by definition
and another expression to construct the lengths' list.
This statement of the problem makes it elementary to write the
function definition. Clearly, we will start the count at the beginning
of the file, so the first command will be `(goto-char (point-min))'.
Next, we start the `while' loop; and the true-or-false test of the loop
can be a regular expression search for the next function definition--so
long as the search succeeds, point is moved forward and then the body
of the loop is evaluated. The body needs an expression that constructs
the lengths' list. `cons', the list construction command, can be used
to create the list. That is almost all there is to it.
Here is what this fragment of code looks like:
(goto-char (point-min))
(while (re-search-forward "^(defun" nil t)
(setq lengths-list
(cons (count-words-in-defun) lengths-list)))
What we have left out is the mechanism for finding the file that
contains the function definitions.
In previous examples, we either used this, the Info file, or we
switched back and forth to some other buffer, such as the `*scratch*'
buffer.
Finding a file is a new process that we have not yet discussed.
File: emacs-lisp-intro.info, Node: Find a File, Next: lengths-list-file, Prev: Several defuns, Up: Words in a defun
Find a File
===========
To find a file in Emacs, you use the `C-x C-f' (`find-file')
command. This command is almost, but not quite right for the lengths
problem.
Let's look at the source for `find-file' (you can use the `find-tag'
command or `C-h f' (`describe-function') to find the source of a
function):
(defun find-file (filename)
"Edit file FILENAME.
Switch to a buffer visiting file FILENAME,
creating one if none already exists."
(interactive "FFind file: ")
(switch-to-buffer (find-file-noselect filename)))
The definition possesses short but complete documentation and an
interactive specification that prompts you for a file name when you use
the command interactively. The body of the definition contains two
functions, `find-file-noselect' and `switch-to-buffer'.
According to its documentation as shown by `C-h f' (the
`describe-function' command), the `find-file-noselect' function reads
the named file into a buffer and returns the buffer. However, the
buffer is not selected. Emacs does not switch its attention (or yours
if you are using `find-file-noselect') to the named buffer. That is
what `switch-to-buffer' does: it switches the buffer to which Emacs
attention is directed; and it switches the buffer displayed in the
window to the new buffer. We have discussed buffer switching
elsewhere. (*Note Switching Buffers::.)
In this histogram project, we do not need to display each file on the
screen as the program determines the length of each definition within
it. Instead of employing `switch-to-buffer', we can work with
`set-buffer', which redirects the attention of the computer program to
a different buffer but does not redisplay it on the screen. So instead
of calling on `find-file' to do the job, we must write our own
expression.
The task is easy: use `find-file-noselect' and `set-buffer'.